Agentic builds: dsynth evidence capture hooks#1517
Open
tuxillo wants to merge 93 commits into
Open
Conversation
Add dsynth hook scripts that snapshot distilled build errors and relevant port metadata on failures, grouped by run, so debugging can stay build-driven without keeping full workdirs. Document the bounded evidence contract and the planned opencode integration/central queue model for asynchronous triage.
Add observe-only state server for remote UI integration: - REST API for runs, jobs, bundles, ports, artifacts - SSE event stream with replay support - SQLite persistence for full history - Filesystem reconciler for live updates Validated on DragonFlyBSD VM - all endpoints tested.
- Add vanilla JS Bootstrap 5 UI served by state-server - Live SSE event stream with replay/reconnect - Views: Overview, Events, Jobs, Runs, Ports, Bundles - Artifact viewer for markdown, diffs, logs - SSE improvements: after_id, tail query params, ts in payloads
- Add /bundles API endpoint listing recent bundles - Add #/bundles route with renderBundles() view - Add Bundles nav item to navbar - Update Phase 9 docs with completion status and new route
- agent-queue-runner: add apply job type and iteration tracking - apply-patch: add DragonFly local mode, --no-push flag, BSD-compatible patch - hook_common.sh: detect rebuild iterations, track previous bundles - Add KEDB entry for DragonFly source patch conventions
Makefiles use tabs, not spaces. The agent was generating patches with spaces which caused patch application failures. Added rule #8 to emphasize preserving exact whitespace from the bundle context.
When retrying a patch application, the branch may already exist from a previous failed attempt. Delete it first to allow the retry.
Stop extraction when hitting common section markers like 'Rationale', 'Files Modified', etc. Also detect when prose text starts after hunks. This prevents non-diff content from being included in patch.diff.
The agent was generating patches with incorrect hunk line counts. Added detailed instructions on unified diff format with example.
- Change dports-patch prompt to request complete file contents - Add extract_files_from_response() to parse FILE content blocks - Add generate_unified_diff() to create diffs programmatically - Add generate_combined_diff() for multi-file patches - Update write_patch_outputs() to try new format first, fallback to legacy This fixes the malformed diff issue - LLMs are good at generating file content but struggle with unified diff syntax and line counts.
The agent was outputting diff syntax inside FILE blocks for Makefile.DragonFly. Make it explicit that Makefile.DragonFly should be raw makefile content, while dragonfly/patch-* files are actual diffs. Also add specific hint for the IFM_IEEE80211_VHT5G error.
…er UI - Add activity_log and runner_status tables to state-server schema - Add /activity and /runner-status API endpoints with SSE events - Update agent-queue-runner to log activities at all job stages - Add heartbeat thread for runner liveness detection (5s interval) - UI: Add Activity Log panel showing last 10 runner activities - UI: Add Runner Status indicator with staleness detection (>15s) - UI: Add back button for artifact navigation in bundle view - UI: Hide session_id.txt files from artifact lists
…b error display - state-server: Only emit runner_status SSE events when status/job_id/stage changes, not on every heartbeat update_at change - app.js: Don't trigger full re-render for runner_status/activity events (fixes bundle tab reset issue), only re-render on overview page - app.js: Add renderJobDetail() with prominent error display and related activity log entries for failed jobs - agent-queue-runner: Write .job.error files before moving failed jobs, move error files along with job files
… + budget to phase 3 Phase 2 collapses to retiring apply-patch entirely after confirming its responsibilities already live elsewhere (iterative loop -> Phase 3 harness, PR creation -> existing process_pr_job). Phase 3 gains a trust-tier policy (AUTO/ASSIST/MANUAL) sourced from config/agentic-policy.json plus budget-bounded intra-job iteration (max iterations + max tokens via litellm response.usage). PR/push is intentionally out of scope of the iterative loop; the loop ends at a local rebuild_proof.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e note process_apply_job in agent-queue-runner was an 11-line stub never dispatched by the runner's job-type table; deletion is cosmetic. AGENTIC_BUILDS.md phase note drops the apply-patch reference and clarifies PR creation is out of scope of the iterative patch loop. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phases 1 and 2 are shipped. Phases 4 and 5 are deferred. The plan document is now a focused Phase 3 implementation plan: opencode + TS plugin retire in favor of a Python harness (litellm-based) under dportsv3.agent, with tools dispatching in-process to a refactored agentic-worker module. Trust-tier + budget policy from config/agentic-policy.json drives auto-iteration. Snippet rounds fold into the triage call. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New Python package under scripts/generator/dportsv3/agent/ that will host the litellm-based replacement for opencode. This commit lands the scaffolding only — nothing wires it into agent-queue-runner yet. Modules: - llm.py: litellm wrapper with normalized Response (text, tool_calls, usage) - prompts.py: TRIAGE_SYSTEM (verbatim from config/opencode/agent/dports-triage.md) - policy.py: load_policy / tier_for, applying confidence_floor downgrades - snippets.py: subprocess wrapper around scripts/snippet-extractor - triage.py: single-LLM-call flow with snippet rounds folded in-process Plus: - config/agentic-policy.json: AUTO/ASSIST/MANUAL tiers + classification map - pyproject.toml: new optional-dependency 'agent = ["litellm"]' litellm's only Rust-built transitive dep is pydantic-core, satisfied by the generator venv's --system-site-packages reading py311-pydantic-core from pkg. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hing/PR The agentic-worker's workspace concept (/build/synth/agentic-workspace, workspace.json, separate FPORTS/DPorts) overlaps with dev-env; pick dev-env as the single isolation primitive. The patch agent's tool surface is reimplemented on top of dev-env exec + writable overlay operations. Also retract phase 2's "keep process_pr_job for manual use" — the loop is purely local. Branches, commits, push, gh pr create all die. agentic-worker the standalone script also dies (596 LOC); functions live in dportsv3.agent.worker. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ubcommands, rebuild_proof schema - Concrete edits table now covers the previously missed cruft in agent-queue-runner: VM_SSH_* constants/env/dispatch (lines 17-21, 71-73, 721-761), DEFAULT_WORKSPACE_CONFIG + workspace.json loader (76, 167). - Step 2 gains two prerequisite dev-env subcommands: 'dportsv3 dev-env status NAME' (JSON readiness) and 'dportsv3 dev-env path NAME [--writable]' (~25 LOC total). The worker uses them as the only interface to dev-env state, no re-parsing of dev-env's internals. - Step 4 pins the new rebuild_proof.json schema: origin, rebuild_ok, dsynth_profile, build_command, timestamp_utc. No branch/head/fports fields. - Step 6 retitled to cover opencode + VM_SSH + workspace cruft in one cleanup pass; negative check grep extended accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When DP_HARNESS_TRIAGE_MODEL is set in the environment, route triage through the in-process litellm harness instead of opencode. Snippet rounds fold into the harness call; the runner no longer re-enqueues for them. When DP_HARNESS_TRIAGE_MODEL is unset, the existing opencode path runs unchanged. - sys.path bootstrap so the standalone runner can import dportsv3.agent from scripts/generator/. - New _process_triage_job_harness helper carrying the harness path: build payload (unchanged), call dportsv3.agent.triage.run, write triage.json audit (classification, confidence, snippet_rounds, tokens, model, via), consult needs_user_context / should_enqueue_patch the same way the opencode path does. - New _write_triage_audit_harness writer for the new triage.json shape. The new triage.md on disk is written by the harness itself as it runs (needed so snippet-extractor can read the requests for the next round). Env vars used: DP_HARNESS_TRIAGE_MODEL (required to take this path), DP_HARNESS_TRIAGE_API_BASE, DP_HARNESS_TRIAGE_API_KEY, DP_HARNESS_TIMEOUT (default 120), DP_HARNESS_MAX_SNIPPET_ROUNDS (default 5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two read-only subcommands the agent harness will use to query env
state and resolve host-side paths:
dportsv3 dev-env status NAME
Prints a single JSON line with name, target, origin, status,
backend, oracle_profile, root_mounted, env_dir.
dportsv3 dev-env path NAME [--writable]
Prints env_dir (default) or env_dir/writable (with --writable).
Backed by the existing EnvironmentStore.{load,env_dir,writable_dir}
and mounts.mounts_under. No new dev-env behavior.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New module dportsv3.agent.worker implementing the harness's tool surface on top of dev-env primitives. Step 2b lands the host-side functions (no chroot execution yet; chroot ops land in 2c). Path resolution: - EnvPaths dataclass + env_paths(env) — shells out to 'dportsv3 dev-env path NAME [--writable]' (cache once per job). - _resolve_chroot_path(paths, '/work/...') → host-side Path under env_dir/writable. Rejects paths outside /work/ and any .. escape via Path.relative_to. Tool functions: - env_verify(env) — wraps 'dportsv3 dev-env status NAME'; raises unless status=='ready' and root_mounted. - get_file(env, path) — base64-encoded read; returns sha256 + size. - put_file(env, path, content, encoding='text'|'base64', expected_sha256=None) — write with optimistic-lock check; preserves file mode on existing files. - emit_diff(env, origin, relpath) — git diff against HEAD in the env's DeltaPorts overlay; never commits or stages. - grep(env, pattern, path, include=None, max_bytes=8192) — rg over the writable overlay, output capped. Verified with a temp-filesystem + git-init test harness: get_file roundtrip, put_file text/base64/lock-match/lock-mismatch, emit_diff finding modifications, grep finding pattern matches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unted Two issues surfaced testing on dfly: 1. subprocess.run(['dportsv3', ...]) failed with FileNotFoundError because the generator venv's bin/ is not on PATH. Fix: invoke as [sys.executable, '-m', 'dportsv3', ...] so it uses the current interpreter's installed dportsv3 package regardless of PATH and skips the wrapper script's bootstrap. Override-able via DPORTSV3_CMD. 2. env_verify raised on root_mounted=false for envs in 'ready' state that hadn't been shelled into. That's a legitimate state — host- side tool ops operate on the writable overlay directly and 'dev-env exec' auto-mounts on demand. Drop the root_mounted gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the six chroot-bound functions to dportsv3.agent.worker. All shell out via 'dportsv3 dev-env exec ENV -- CMD'; dev-env auto-mounts the env root on demand. - materialize_dports(env, origin): 'reapply ORIGIN' (existing dev-env helper wrapping dportsv3 compose). - extract(env, origin): 'make -C /work/DPorts/<origin> extract', then queries WRKDIR/WRKSRC via 'make -V' so the LLM can address files in the extracted source. - dupe(env, path): 'dupe PATH' (in-chroot tool; clones source file with .orig backup so genpatch can later produce a unified diff). - genpatch(env, path): 'genpatch PATH'; returns list of generated patch-* files from /work/genpatch-out/. - install_patches(env, origin, patches=None): host-side shutil.copy2 from <writable>/work/genpatch-out/ into <writable>/work/DeltaPorts/ports/<origin>/dragonfly/. No chroot exec needed since both source and destination are in the writable overlay. - dsynth_build(env, origin): 'dbuild ORIGIN' (existing dev-env helper); returns rc + stdout/stderr + rebuild_ok=(rc==0). Refinements per review: - env_paths() now @lru_cache'd so repeated tool calls in one attempt pay the dev-env subprocess cost once. - Module docstring documents the choice to drive dev-env via its CLI (stable contract) rather than importing EnvironmentStore (internals). The tool surface is now complete for step 3 (tools.py registry). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rtsv3
python -m dportsv3 skips the bash wrapper at the repo root that knows
how to dispatch the 'dev-env' subcommand to a separate venv. So
'python -m dportsv3 dev-env path 2026Q2' fails with "invalid choice:
dev-env" because the generator's argparse doesn't include it.
Resolution order for the wrapper:
1. DPORTSV3_CMD env var override
2. <repo>/dportsv3 sibling lookup (4 parents up from worker.py)
3. shutil.which('dportsv3') on PATH
Lazily resolved on first use so import never fails when the wrapper
isn't reachable (e.g. unit tests outside the repo).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LLM-facing tool functions (materialize_dports, extract, dupe, genpatch,
dsynth_build, emit_diff) now return a uniform shape:
{
"ok": bool, # rc == 0
"rc": int,
"stdout_tail": str, # last 32KB if longer
"stderr_tail": str,
"stdout_truncated": bool,
"stderr_truncated": bool,
...tool-specific keys...,
}
Tail-preservation matters for build errors (the useful diagnostics
live at the end of the log, not the start). The LLM inspects 'ok' +
the tails to decide what to do — no more opaque RuntimeError(2000
chars of mounting INFO logs) burying the real make/dsynth error.
Infrastructure helpers (env_paths, env_verify) still raise — those
are fatal setup errors that don't make sense to surface to the LLM.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The dev-env chroot doesn't mount the dports tree at the conventional /usr/dports path; /work/DPorts is the writable overlay where compose materializes ports. Without PORTSDIR set, /usr/share/mk/bsd.port.mk fails to open /usr/dports/Mk/bsd.port.mk and 'make extract' dies before doing any work. Fix: pass PORTSDIR=/work/DPorts on every 'make' invocation (both the extract step and the WRKDIR/WRKSRC query). Define PORTSDIR as a module constant in worker.py so future make-based tools share it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re writable DragonFly's bsd.port.mk defaults WRKDIRPREFIX=/usr/obj/dports for build artifacts, but /usr/obj is read-only in the dev-env chroot (it's part of the base mount). 'make extract' progressed past the PORTSDIR fix but then died with 'mkdir: /usr/obj/dports: Read-only file system' while creating .extract_done markers. Fix: point WRKDIRPREFIX at /work/obj (writable, under the env's writable overlay). Also pass BATCH=yes so the ports config dialog doesn't try to prompt for an unattached tty. Factored the common overrides (PORTSDIR, WRKDIRPREFIX, BATCH) into _make_vars() since future tools that call make will need the same. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 596-line standalone worker managed its own /build/synth/agentic- workspace/ (DeltaPorts, FPORTS, DPorts, workspace.json) and was invoked over SSH by config/opencode/tool/dports.ts. dportsv3.agent.worker (landed in 2b/2c) replaces it on top of dev-env primitives — same tool surface, no separate workspace concept, no SSH. The TS plugin at config/opencode/tool/dports.ts still references the old worker path; it's slated for deletion in step 6 (retire opencode). Until then, the opencode-driven patch path is broken — which is fine because the prior agentic stack had no working production users and the harness patch flow lands in step 4. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per the plan: no PRs, no branches, no push. The loop is purely local. process_pr_job's job (git push + gh pr create from rebuild_proof.json's deltaports_branch + deltaports_head fields) doesn't fit the new model: the harness never commits or branches in the env's writable overlay, so those fields won't exist in the new rebuild_proof.json schema either. Deleted: - process_pr_job function body (~102 LOC) - 'type == "pr"' dispatch arm in process_job - dry-run handling for type=pr in process_job Side effects to be cleaned in step 6 with the rest of the opencode/ workspace sweep: - DEFAULT_WORKSPACE_CONFIG and load_workspace_config remain because build_triage_payload still embeds workspace.json in the LLM prompt. Dead in spirit; step 6 sweep removes them along with VM_SSH and opencode plumbing. Enqueueing a type=pr job now hits the 'unknown job type' fallback, which is correct behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The awk-driven excision of process_pr_job in 316b4a1 produced a non-executable file. chmod 755 to restore. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dportsv3.agent.tools — 11 hand-written OpenAI-format tool schemas
matching the worker function surface, plus a dispatch helper:
- env arg bound by caller (patch.run in step 4), not exposed to LLM
- inspect-based arg validation (reject unexpected args; flag missing
required args)
- catch + surface worker exceptions as {ok: false, error: ...,
traceback: ...} so the LLM can recover on the next turn
- workers already return {ok, ...} dicts; passthrough preserved
dportsv3.agent.tool_loop — multi-turn driver:
- call llm.complete with messages + tool schemas
- on tool_calls: dispatch each, append assistant+tool messages,
re-call
- stop on text-only response or max_turns=20 safety cap
- returns (final_response, accumulated_usage)
The assistant message is rebuilt from our normalized Response
(role=assistant, content, tool_calls[]) rather than relying on
litellm's internal raw shape — provider-portable.
Verified with a stubbed LLM:
- 3-turn happy path: 2 tool calls then text-only; history shape
[system, user, assistant, tool, assistant, tool]; usage summed
- bad tool name surfaced as {ok: false, error: "unknown tool: ..."}
- missing required arg surfaced as {ok: false, error: "missing ..."}
- worker FileNotFoundError surfaced as {ok: false, error: "...",
traceback: "..."}
No new external deps.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DragonFly's py311-tokenizers tokenizers.abi3.so has missing DT_NEEDED entries (libonig, esaxx) — loading it fails at import time with chains of "Undefined symbol" errors that patchelf doesn't fully resolve. litellm transitively imports tokenizers for local cost calculation, but we don't need local token counting (usage totals come from response.usage.total_tokens). llm.py now tries to import tokenizers up-front. If that fails, inject a no-op stub into sys.modules with Tokenizer/Encoding so litellm's import chain succeeds. On platforms where tokenizers works (Linux, macOS), the stub never runs. The stub exposes only the surface litellm touches at import time (Tokenizer.from_pretrained, .from_file, .encode); calls return empty results. Cost calculation will be inaccurate or fall back to heuristics, which is fine because we never invoke litellm.cost_calculator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
llm.py's tokenizers stub only fired when llm was imported — but the runner / tools modules / manual inspections can hit litellm without going through llm first. Moving the stub to dportsv3/agent/__init__.py makes it run as soon as any module under the package is imported. This unblocks invocations like: python -c "import dportsv3.agent; import litellm; ..." without needing to pre-import llm. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ting When litellm's model-name → provider heuristic mis-routes (e.g., any model name containing 'deepseek' or 'claude' is shunted to the native provider client even when openai/ prefix and api_base are set), custom_llm_provider forces a specific code path. Generic passthrough; default None means "let litellm pick from prefix as before." Set per flow: - agent-queue-runner: DP_HARNESS_TRIAGE_PROVIDER env var (DP_HARNESS_PATCH_PROVIDER will follow in step 4 when patch wires) - llm.complete(), tool_loop.run(), triage.run(): custom_llm_provider kwarg - _manual_test_tool_loop: DP_TEST_PROVIDER env var Native providers (anthropic/, deepseek/, nvidia_nim/, ...) work unchanged because they don't set custom_llm_provider. The override is only used when needed (most often: openai-compat third-party endpoints with model names that fool the heuristic). Also commits the manual test helper for tool_loop that was previously left untracked. Useful while step 4 is in flight. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thinking-mode providers (DeepSeek v4-pro/v4-flash directly or via opencode.ai/zen, OpenAI o-series via some relays) emit a reasoning_content field alongside content + tool_calls, holding the model's intermediate chain-of-thought. The upstream API requires this field to be passed back on the next request, or the multi-turn call fails with HTTP 400: "The reasoning_content in the thinking mode must be passed back to the API." Changes: - llm.Response gains optional reasoning_content field; llm.complete extracts it from msg.reasoning_content if present (None otherwise). - tool_loop._assistant_message_from includes reasoning_content in the reconstructed assistant message when set, so the next LLM request preserves continuity. No-op for non-thinking models — reasoning_content stays None, nothing extra is sent. Verified with stubbed Response objects: thinking-mode reconstructed message carries reasoning_content; non-thinking does not. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously every get_file result was base64. For UTF-8 text files
(Makefiles, patches, source, the bulk of what the agent reads), this
inflated content by ~33% AND made the model mentally decode base64
to find anything inside — burning prompt AND completion tokens.
Now: read bytes, try UTF-8 decode with a NUL-byte sanity check;
return {encoding: 'text', content: <str>} on success, fall back to
{encoding: 'base64', content: <b64>} for binary. sha256 is computed
over the raw bytes, so put_file's expected_sha256 round-trip works
regardless of encoding.
Verified with a temp-fs harness: text Makefile returns text;
PNG-header file returns base64.
Schema description updated so the LLM understands the dual-mode
return shape. Example path in description updated to /work/DPorts/...
(the common path; agent reads materialized port files from DPorts,
edits source-of-truth in DeltaPorts).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The patch agent now runs end-to-end through the harness instead of opencode.
New code:
- prompts.PATCH_SYSTEM: 4kB system prompt spelling out the dev-env's
three-tree layout (freebsd-ports / DeltaPorts / DPorts), tool
vocabulary, the repair loop, discipline rules (no commits/push/PRs),
and the mandatory output format ending in the new rebuild_proof.json
schema (origin, rebuild_ok, dsynth_profile, build_command,
timestamp_utc — no branch/head/fports fields).
- attempt_loop.run: budget-bounded retry around tool_loop. Each
attempt is a fresh [system, user] conversation (with a small failure-
context user turn appended on retries) so tool-call traces don't
compound across attempts. Stops on rebuild_ok=true, budget exhaustion,
or max_iterations. Returns PatchResult{status, final_text, usage,
attempts[], proof}.
- patch.run: thin wrapper over attempt_loop.run.
Runner wiring (mirrors step 1 triage adapter):
- New env vars: DP_HARNESS_PATCH_{MODEL,API_BASE,API_KEY,PROVIDER,
TIMEOUT}, DP_HARNESS_ENV (dev-env name default), DP_HARNESS_POLICY
(optional override of config/agentic-policy.json path).
- process_patch_job: when DP_HARNESS_PATCH_MODEL is set, route to
_process_patch_job_harness. It reads triage.md, resolves the tier
via policy.tier_for(classification, confidence), and calls
dportsv3.agent.patch.run with the tier's budget.
- Bundle outputs: analysis/patch.md (final LLM text), analysis/
rebuild_proof.json (parsed proof block), analysis/patch_audit.json
(status + tokens + per-attempt info + model), analysis/changes.diff
(host-side git diff vs HEAD in the env's DeltaPorts overlay).
Verified attempt_loop against a stubbed tool_loop:
- success on first attempt
- failure then success (failure-context message added to retry)
- budget exhausted mid-sequence
- needs-help after all attempts fail
- missing rebuild_proof JSON falls back to needs-help
End-to-end against a real LLM + env requires a manual smoke run with
DP_HARNESS_PATCH_MODEL + a bundle on disk; covered in the next message.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_manual_test_patch_flow.py fixtures a minimal bundle under /tmp (meta.txt, errors.txt, analysis/triage.md) and invokes dportsv3.agent.patch.run directly with a fabricated payload — bypassing the queue runner so the harness's loop is exercised in isolation against a real LLM + real dev-env. The fixture intentionally doesn't simulate a broken port; it asks the agent to verify the current state of the port via dsynth_build and emit rebuild_proof.json accordingly. Pointing at devel/readline (default) should reach rebuild_ok=true within 1-2 attempts. Env vars mirror _manual_test_tool_loop (DP_TEST_MODEL, ENV, ORIGIN, TIER_ITERATIONS, TIER_TOKENS, plus PROVIDER/API_BASE/API_KEY). The bundle dir is preserved on exit so you can inspect the artifacts the runner-side adapter would have written: patch.md, patch_audit.json, rebuild_proof.json, changes.diff (note: those are written by agent-queue-runner's _process_patch_job_harness, NOT by this fixture — this fixture only calls patch.run and reports the PatchResult). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dsynth's 'build' subcommand asks interactive questions (most commonly "Rebuild local repository? [Y/n]" before scanning, sometimes follow- ups during the build). The agent has no tty, so the subprocess sat in [ttyin] state and the patch flow hung — observed mid-test: load: 0.67 cmd: dsynth 31619 [ttyin] 0.00u 0.06s 0% 4128k Fixes: - worker._exec accepts optional input_text kwarg; default stdin is empty string (effectively /dev/null) so unexpected prompts fail fast rather than blocking. - worker.dsynth_build pipes 'y\\n' * 50 to stdin to clear dsynth's prompts. Generous enough for multi-question build cycles, cheap to send. dbuild (the dev-env helper) is unchanged — humans running it interactively still get the prompts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_turns default Observed: a single attempt burned 2,073,090 tokens before attempt_loop's between-attempts budget check caught it. Root cause: tool_loop only enforced max_turns (30), not the token budget. The model went into a tool-call frenzy and attempt_loop only noticed after 30 turns of accumulating 70k-token contexts. Fixes: - tool_loop.run: new max_tokens kwarg; checked at the top of each turn before issuing the LLM call. When the running total reaches the cap, return whatever Response we have. Default 0 = no cap (callers should pass remaining budget). - attempt_loop.run: passes tier's remaining budget (max_tokens - tokens_used_so_far) as max_tokens to tool_loop on each attempt. Also short-circuits with status=budget-exhausted before kicking off a new attempt if the budget is already gone. - tool_loop max_turns default: 20 -> 12. A patch task taking more than ~12 tool calls per attempt is in trouble; the cap should stop it sooner. - attempt_loop max_tool_turns default: 30 -> 12. Verified with stubbed LLM: tool_loop stops at 1200 tokens when max_tokens=1200 (turn 3 was the first check after total>=cap). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the patch fixture run produces a surprising token count, we need to see what the model actually did — final_text alone tells us nothing if the loop ended on a tool call. _install_session_dump wraps llm.complete and tools.dispatch to write each turn as a JSON line to <bundle>/session.jsonl: - llm_call records: messages_preview (with long strings truncated to 800 chars), response.text (1200 chars), tool_calls, reasoning_content (600 chars), usage. - tool_dispatch records: tool name, arguments, ok flag, stdout/stderr tails truncated to 600 chars. Excludes result body (file bytes, full schemas) to keep the trace compact and shareable. After a run, share session.jsonl and the per-turn behavior is visible without re-running. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
We are designing a system to automatically (agent-assisted) fix ports while keeping the existing, build-driven workflow intact:
dsynthstays the authoritative build executor.What this PR adds (foundation)
scripts/dsynth-hooks/:hook_run_start/hook_run_endgroup failures per build run and snapshot dsynth summary lists.hook_pkg_failurecreates a per-failure evidence bundle with:logs/errors.txt(high-signal extract, capped at 200KB)logs/full.log.gz(full log preserved for humans)port/*snapshot (Makefile/distinfo/pkg-plist/patches, etc.)meta.txtand basic dsynth profile/config snapshotsdocs/AGENTIC_BUILDS.mddescribing:What this PR does not do (yet)
Those are intentionally deferred so this PR can land the core evidence-capture mechanism safely and independently.
How to try it
scripts/dsynth-hooks/hook_*andscripts/dsynth-hooks/hook_common.shinto dsynth’s config base (/etc/dsynth/or/usr/local/etc/dsynth/) and making them executable.dsynthnormally.${Directory_logs}/evidence/runs/.../ports/.../for the evidence bundle.Why this matters for automated fixing
Reliable, size-capped evidence capture is the prerequisite for an automated port-fixing system:
errors.txt+ port metadata)dsynth-driven, and automation can be layered on without destabilizing build infrastructure